** Data reading and variable selection from raw data
** General Social Survey USA 1972 to 2014


** 01. Reading data **

cap log close
clear all
set more off
cd /*insert you work directory here*/
#delimit ;

   infix
      year     1 - 20
      degree   21 - 40
      padeg    41 - 60
      madeg    61 - 80
      sex      81 - 100
      cohort   101 - 120
      ballot   121 - 140
      prestg10 141 - 160
      papres10 161 - 180
      mapres10 181 - 200
      isco08   201 - 220
      paisco08 221 - 240
      maeduc   241 - 260
      paeduc   261 - 280
      id_      281 - 300
      occ      301 - 320
      occ10    321 - 340
      indus10  341 - 360
      paocc10  361 - 380
      paind10  381 - 400
      maocc10  401 - 420
      maind10  421 - 440
      sibs     441 - 460
      age      461 - 480
      educ     481 - 500
      maisco08 501 - 520
using GSS.dat;



** 02. Consructing year and country variables **

lab var year "survey year"

ge country=840
lab var country "ISO country code"
//USA: 840 (see "ISO Country Codes.pdf) 


** 03. ID variables **

ge pid=id_
lab var pid "person id"


** 04. Basic Demographics (Sex and Age/birth year) **

lab var sex "sex"
lab def sex 1 "male" 2 "female"
lab val sex sex


lab var age "age"

rename cohort birthyr

label variable birthyr   "Year of birth"

** 05. Siblings **

rename sibs nsibs

label variable nsibs "Number of brothers and sisters"


//missing
recode nsibs (98=.) (99=.)

** 06. Own education **

rename degree educ_cat

rename educ educ_yrs


** label values of degree

label define gsp001x 9 "No answer" 8 "Don't know" 7 "Not applicable" 4 "Graduate" 3 "Bachelor" 2 "Junior college" 1 "High school" 0 "Lt high school"

label values educ_cat  gsp001x
   

//label respondent education

label variable educ_cat   "Rs highest degree"

label variable educ_yrs    "Highest year of school completed"

//missing

recode educ_yrs (97/99=.)
recode educ_cat (8/9=.)


** 07. Parents' education: Father and/or Mother **

rename padeg faeduc_cat
rename paeduc faeduc_yrs

rename madeg moeduc_cat
rename maeduc moeduc_yrs

*label parent's education values

label values faeduc_cat gsp001x
label values moeduc_cat gsp001x

*label parental education variables

lab var faeduc_cat "father's education level"
lab var faeduc_yrs "father's education year"

lab var moeduc_cat "mother's education level"
lab var moeduc_yrs "mother's education year"
// missing
recode moeduc_yrs faeduc_yrs (97/99=.)
recode moeduc_cat faeduc_cat (8/9=.)


** 08. Own occupation **

rename isco08 occ_ISCO
lab var occ_ISCO "current occupation_ISCO08"


// missing
recode occ_ISCO  (9997/9999=.) (0=.)


** 09. Parents' occupation **

rename paisco08 faocc_ISCO
rename maisco08 moocc_ISCO

lab var faocc_ISCO "R's father's occupation, 2010 census & 2008 isco code"
lab var moocc_ISCO "R's mother's occupation, 2010 census & 2008 isco code"

//missing
recode faocc_ISCO moocc_ISCO (9997/9999=.) (0=.)


** 10. ISCED Education Harmonization**

ge educ_ISCED = .
replace educ_ISCED = 000 if educ_yrs < 4
replace educ_ISCED = 100 if educ_yrs > 3 & educ_yrs < 7
replace educ_ISCED = 200 if educ_yrs > 6 & educ_yrs < 12
replace educ_ISCED = 300 if educ_cat == 1
replace educ_ISCED = 400 if educ_yrs > 12 & educ_cat == 1
replace educ_ISCED = 500 if educ_cat == 2
replace educ_ISCED = 600 if educ_cat == 3
replace educ_ISCED = 700 if educ_cat == 4
replace educ_ISCED = 800 if educ_yrs > 17 & educ_cat == 4

lab var educ_ISCED "Respondent's Education - ISCED"

ge faeduc_ISCED = .
replace faeduc_ISCED = 000 if faeduc_yrs < 4
replace faeduc_ISCED = 100 if faeduc_yrs > 3 & faeduc_yrs < 7
replace faeduc_ISCED = 200 if faeduc_yrs > 6 & faeduc_yrs < 12
replace faeduc_ISCED = 300 if faeduc_cat == 1
replace faeduc_ISCED = 400 if faeduc_yrs > 12 & faeduc_cat == 1
replace faeduc_ISCED = 500 if faeduc_cat == 2
replace faeduc_ISCED = 600 if faeduc_cat == 3
replace faeduc_ISCED = 700 if faeduc_cat == 4
replace faeduc_ISCED = 800 if faeduc_yrs > 17 & faeduc_cat == 4

lab var faeduc_ISCED "Father's Education - ISCED"

ge moeduc_ISCED = .
replace moeduc_ISCED = 000 if moeduc_yrs < 4
replace moeduc_ISCED = 100 if moeduc_yrs > 3 & moeduc_yrs < 7
replace moeduc_ISCED = 200 if moeduc_yrs > 6 & moeduc_yrs < 12
replace moeduc_ISCED = 300 if moeduc_cat == 1
replace moeduc_ISCED = 400 if moeduc_yrs > 12 & moeduc_cat == 1
replace moeduc_ISCED = 500 if moeduc_cat == 2
replace moeduc_ISCED = 600 if moeduc_cat == 3
replace moeduc_ISCED = 700 if moeduc_cat == 4
replace moeduc_ISCED = 800 if moeduc_yrs > 17 & moeduc_cat == 4

lab var moeduc_ISCED "Mother's Education - ISCED"

log using /*insert you work directory here*/, replace text

** Data reading and variable selection from raw data
** USA GSS 1972-2014

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs, d

** R's Own Education **
tab1 educ_cat educ_yrs educ_ISCED

** Parental Education **
tab1 faeduc_cat moeduc_cat faeduc_yrs moeduc_yrs faeduc_ISCED moeduc_ISCED

** R's Own Occupation **
tab1 occ_ISCO

** Parental Occupation **
tab1 faocc_ISCO moocc_ISCO

log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nsibs ///
	 educ_cat educ_yrs educ_ISCED faeduc_cat moeduc_cat faeduc_yrs moeduc_yrs faeduc_ISCED moeduc_ISCED ///
	 occ_ISCO ///
	 faocc_ISCO moocc_ISCO


** 12. Save the Data File **

saveold /*insert you work directory here*/, replace


** 13. creating separate files for each wave ***

cd /*insert you work directory here*/
 use /*read your data here*/  
  preserve 
  foreach i of num 1972/2014 {
          keep if year == `i'
          save USA`i'
          restore, preserve 
  }


